Mining Gene Expression Data Using PCA Based Clustering
نویسنده
چکیده
As the amount of laboratory data in molecular biology and bioinformatics grows exponentially in each year due to advanced technologies such as DNA Microarray, new efficient and effective clustering methods must be developed to process this fast growing amount of biological data. Numerous clustering techniques have been applied in the analysis of gene expression data to extract biologically significant patterns. But there are issues like clustering quality, high dimensionality of input data and computational efficiency need to be addressed. A novel hybrid clustering algorithm is proposed, which is a blend of Principal Component Analysis (PCA) and the enhanced correlation based clustering. PCA is a classical statistic technique for finding patterns in data of high dimension. The empirical results show that this approach provides more stable clustering performance in terms of quality and efficiency. The resulting clusters offer potential insight into gene function, molecular biological processes and regulatory mechanisms.
منابع مشابه
Privacy Preserving Based on PCA Transformation Using Data Perturbation Technique
Maintain confidentiality, privacy and security research in data mining (PPDM) is one of the biggest trends. Recent advances in data collection, data dissemination and related technologies have inaugurated a new era of research where existing data mining algorithms should be reconsidered from a different point of view, this of privacy preservation. We propose a simple PCA based transformation ap...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملخوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملNEC for Gene Expression Analysis
Aim of this work is to apply a novel comprehensive machine learning tool for data mining to preprocessing and interpretation of gene expression data. Furthermore, some visualization facilities are provided. The data mining framework consists of two main parts: preprocessing and clustering-agglomerating phases. To the first phase belong a noise filtering procedure and a non-linear PCA Neural Net...
متن کاملTaxonomically Clustering Organisms Based on the Profiles of Gene Sequences Using PCA
The biological implications of bioinformatics can already be seen in various implementations. Biological taxonomy may seem like a simple science in which the biologists merely observe similarities among organisms and construct classifications according to those similarities, but it is not so simple. By applying data mining techniques on gene sequence database we can cluster the data to find int...
متن کامل